76 research outputs found

    On the value of popular crystallographic databases for machine learning prediction of space groups

    Get PDF
    Predicting crystal structure information is a challenging problem in materials science that clearly benefits from artificial intelligence approaches. The leading strategies in machine learning are notoriously data-hungry and although a handful of large crystallographic databases are currently available, their predictive quality has never been assessed. In this article, we have employed composition-driven machine learning models, as well as deep learning, to predict space groups from well known experimental and theoretical databases. The results generated by comprehensive testing indicate that data-abundant repositories such as COD (Crystallography Open Database) and OQMD (Open Quantum Materials Database) do not provide the best models even for heavily populated space groups. Classification models trained on databases such as the Pearson Crystal Database and ICSD (Inorganic Crystal Structure Database), and to a lesser extent the Materials Project, generally outperform their data-richer counterparts due to more balanced distributions of the representative classes. Experimental validation with novel high entropy compounds was used to confirm the predictive value of the different databases and showcase the scope of the machine learning approaches employed.publishedVersio

    FP-MAP: an extensive library of fingerprint-based molecular activity prediction tools

    Get PDF
    Discovering new drugs for disease treatment is challenging, requiring a multidisciplinary effort as well as time, and resources. With a view to improving hit discovery and lead compound identification, machine learning (ML) approaches are being increasingly used in the decision-making process. Although a number of ML-based studies have been published, most studies only report fragments of the wider range of bioactivities wherein each model typically focuses on a particular disease. This study introduces FP-MAP, an extensive atlas of fingerprint-based prediction models that covers a diverse range of activities including neglected tropical diseases (caused by viral, bacterial and parasitic pathogens) as well as other targets implicated in diseases such as Alzheimer’s. To arrive at the best predictive models, performance of ≈4,000 classification/regression models were evaluated on different bioactivity data sets using 12 different molecular fingerprints. The best performing models that achieved test set AUC values of 0.62–0.99 have been integrated into an easy-to-use graphical user interface that can be downloaded from https://gitlab.com/vishsoft/fpmap

    DENOPTIM: Software for Computational de Novo Design of Organic and Inorganic Molecules

    Get PDF
    A general-purpose software package, termed DE Novo OPTimization of In/organic Molecules (DENOPTIM), for de novo design and virtual screening of functional molecules is described. Molecules of any element and kind, including metastable species and transition states, are handled as chemical objects that go beyond valence-rules representations. Synthetic accessibility of the generated molecules is ensured via detailed control of the kinds of bonds that are allowed to form in the automated molecular building process. DENOPTIM contains a combinatorial explorer for screening and a genetic algorithm for global optimization of user-defined properties. Estimates of these properties may be obtained to form the fitness function (figure of merit or scoring function) from external molecular modeling programs via shell scripts. Examples of a range of different fitness functions and DENOPTIM applications, including an easy-to-do test case, are described. DENOPTIM is available as Open Source from https://github.com/denoptim-project/DENOPTIM.acceptedVersio

    Predicting Multicomponent Protein Assemblies Using an Ant Colony Approach

    Get PDF
    National audienceBiological processes are often governed by functional modules of large protein assemblies such as the proteasomes and the nuclear pore complex, for example. However, atomic structures can be determined experimentally only for a small fraction of these multicomponent assemblies. In this article, we present an ant colony optimization based approach to predict the structure of large multicomponent complexes. Starting with pair-wise docking predictions, a multigraph consisting of vertices representing the component proteins and edges representing scored transformations is constructed. Thus, the assembly problem corresponds to identifying minimum weighted spanning trees that yield arrangements of components with few atomic clashes. The utility of the approach is demonstrated using protein complexes taken from the Protein Data Bank. Our algorithm was able to identify near-native solutions for 5 of the 6 cases tested, including one 6-component complex. This demonstrates that the ant colony model provides a useful way to deal with highly combinatorial problems such as assembling multicomponent protein complexes

    Protein-protein docking using region-based 3D Zernike descriptors

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protein-protein interactions are a pivotal component of many biological processes and mediate a variety of functions. Knowing the tertiary structure of a protein complex is therefore essential for understanding the interaction mechanism. However, experimental techniques to solve the structure of the complex are often found to be difficult. To this end, computational protein-protein docking approaches can provide a useful alternative to address this issue. Prediction of docking conformations relies on methods that effectively capture shape features of the participating proteins while giving due consideration to conformational changes that may occur.</p> <p>Results</p> <p>We present a novel protein docking algorithm based on the use of 3D Zernike descriptors as regional features of molecular shape. The key motivation of using these descriptors is their invariance to transformation, in addition to a compact representation of local surface shape characteristics. Docking decoys are generated using geometric hashing, which are then ranked by a scoring function that incorporates a buried surface area and a novel geometric complementarity term based on normals associated with the 3D Zernike shape description. Our docking algorithm was tested on both bound and unbound cases in the ZDOCK benchmark 2.0 dataset. In 74% of the bound docking predictions, our method was able to find a near-native solution (interface C-<it>α</it>RMSD ≤ 2.5 Å) within the top 1000 ranks. For unbound docking, among the 60 complexes for which our algorithm returned at least one hit, 60% of the cases were ranked within the top 2000. Comparison with existing shape-based docking algorithms shows that our method has a better performance than the others in unbound docking while remaining competitive for bound docking cases.</p> <p>Conclusion</p> <p>We show for the first time that the 3D Zernike descriptors are adept in capturing shape complementarity at the protein-protein interface and useful for protein docking prediction. Rigorous benchmark studies show that our docking approach has a superior performance compared to existing methods.</p

    Using Graphics Processors to Accelerate Protein Docking Calculations

    Get PDF
    International audienceProtein docking is the computationally intensive task of calculating the three-dimensional structure of a protein complex starting from the individual structures of the constituent proteins. In order to make the calculation tractable, most docking algorithms begin by assuming that the structures to be docked are rigid. This article describes some recent developments we have made to adapt our FFT-based “Hex” rigid-body docking algorithm to exploit the computational power of modern graphics processors (GPUs). The Hex algorithm is very efficient on conventional central processor units (CPUs), yet significant further speed-ups have been obtained by using GPUs. Thus, FFT-based docking calculations which formerly took many hours to complete using CPUs may now be carried out in a matter of seconds using GPUs. The Hex docking program and access to a server version of Hex on a GPU-based compute cluster are both available for public use

    HexServer: an FFT-based protein docking server powered by graphics processors

    Get PDF
    HexServer (http://hexserver.loria.fr/) is the first Fourier transform (FFT)-based protein docking server to be powered by graphics processors. Using two graphics processors simultaneously, a typical 6D docking run takes ∼15 s, which is up to two orders of magnitude faster than conventional FFT-based docking approaches using comparable resolution and scoring functions. The server requires two protein structures in PDB format to be uploaded, and it produces a ranked list of up to 1000 docking predictions. Knowledge of one or both protein binding sites may be used to focus and shorten the calculation when such information is available. The first 20 predictions may be accessed individually, and a single file of all predicted orientations may be downloaded as a compressed multi-model PDB file. The server is publicly available and does not require any registration or identification by the user

    Local initiation conditions for water autoionization

    Get PDF
    The pH of liquid water is determined by the infrequent process in which water molecules split into short-lived hydroxide and hydronium ions. This reaction is difficult to probe experimentally and challenging to simulate. One of the open questions is whether the local water structure around a slightly stretched OH bond is actually initiating the eventual breakage of this bond or whether this event is driven by a global ordering that involves many water molecules far away from the reaction center. Here, we investigated the self-ionization of water at room temperature by rare-event ab initio molecular dynamics and obtained autoionization rates and activation energies in good agreement with experiments. Based on the analysis of thousands of molecular trajectories, we identified a couple of local order parameters and show that if a bond stretch occurs when all these parameters are around their ideal range, the chance for the first dissociation step (double-proton jump) increases from 10(-7) to 0.4. Understanding these initiation triggers might ultimately allow the steering of chemical reactions

    Rapid determination of thyroid hormones in blood plasma from Glaucous gulls and Baikal seals by HybridSPE®-LC-MS/MS

    Get PDF
    A rapid hybrid solid phase extraction (HybridSPE®) protocol tailored to liquid chromatography–electrospray ionization tandem mass spectrometry (LC–ESI–MS/MS) analysis, was developed for the determination of four thyroid hormones, L-Thyroxine (T4), 3,3′,5-triiodo-L-thyronine (T3), 3,3′,5′-triiodo-L-thyronine (rT3) and 3,3′-diiodo-L-thyronine (T2) in blood plasma from Glaucous gulls (Larus hyperboreus) and Baikal seals (Phoca sibirica). The use of target analyte specific 13C internal standards allowed quantification to be performed through the standard solvent calibration curves and alleviated the need to perform quantification with matrix match curves. The relative recoveries were 100.0–110.1 % for T4, 99.1–102.2 % for T3, 100.5–108.0 % for rT3, and 100.5–104.6 % for T2. The matrix effects ranged from −1.52 to −6.10 %, demonstrating minor signal suppression during analysis. The method intra-day precision (method repeatability, RSD %, N = 5, k = 1 day) and inter-day precision (method reproducibility, RSD %, N = 10, k = 2 days) at the 1 ng/mL concentration of fortification were 8.54–15.4 % and 15.4–24.8 %, respectively, indicating acceptable chromatographic peak stabilities for all target THs even at trace level concentrations. The method limit of detection (LOD) for T4, T3, rT3 and T2 was 0.17, 0.16, 0.30 and 0.17 ng/mL, respectively. The HybridSPE® protocol was simple and rapid (<1 min) upon application, while the HybridSPE® cartridge did not require (as in classical SPE cartridges) any additional equilibration nor conditioning step prior sample loading. A total of 46 blood plasma samples, 30 samples collected from Glaucous gulls and 16 samples collected from Baikal seals, were analyzed for thyroid hormones to demonstrate the applicability of the developed method in these wildlife species. The concentrations of T4 and T3 in blood plasma from the Glaucous gulls were 5.95–44.2 and 0.37–5.61 ng/mL, respectively, whereas those from Baikal seals were 3.57–46.5 and 0.45–2.07 ng/mL, respectively. In both species, rT3 demonstrated low detection rate, while T2 was not detected. Furthermore, cross-array comparison between the HybridSPE®-LC-MS/MS protocol and an established routine radioimmunoassay (RIA) kit-based method was performed for T4 and T3 concentrations from selected Baikal seal plasma samples.publishedVersio
    corecore